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In this report, we derive a non-negative series expansion for the Jensen-Shannon divergence (JSD) 
between two probability distributions. This series expansion is shown to be useful for numerical 
calculations of the JSD, when the probability distributions are nearly equal, and for which, conse- 
quently, small numerical errors dominate evaluation. 
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I. INTRODUCTION 



The Jensen-Shannon divergence (JSD) has been widely used as a dissimilarity measure between weighted probability 
distributions. The direct numerical evaluation of the exact expression for the JSD (involving difference of logarithms) , 
however, leads to numerical errors when the distributions are close to each other (small JSD); when the element-wise 
difference between the distributions is O(10 _1 ), this naive formula produces erroneous values (sometimes negative) 
when used for numerical calculations. In this report, we derive a provably non-negative series expansion for the JSD 
which can be used in the small JSD limit, where the naive formula fails. 



II. SERIES EXPANSION FOR JENSEN-SHANNON DIVERGENCE 



Consider two discrete probability distributions p 1 and p 2 over a sample space S of cardinality N with relative 
normalized weights -K\ and 7r 2 between them. The JSD between the distributions is then defined as [T] 



A nai „ e [pi, p 2 ; tti, vr 2 ] = H [ttiPi + 7r 2 p 2 ] - (izxH^x] + 7r 2 iJ[p 2 ]) 
where the entropy (measured in nats) of a probability distribution is defined as 
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and 



= + «)(1 + e 3 ) \og{p 3 {\ + ej)) - \pA 1 ~ - ^) log(Pj(l - e,0) 

+ aej) log(pj) - ^(1 + aej) log(l - e|) - i^(a + £,) log 



Thus, 



H^lPlj + Tl2P2j) - (7Ti/l(py) + TT 2 h(p2 3 )) = ~Pj 



The Taylor series expansion of the logarithm function is given as 
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log(l + z) =E< 
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The logarithms in the expression for the J-S divergence can then be written as 

oo 

l0g(l + £j) = E^4 

i=l 
oo 

log(l- £j ) = ^-ifae) 

i=l 

OO 

log(l + aej) = ^c^'ej. 

i=l 

We then have A = | SjliPj^e with 

^ = (1 + aej) [log(l + Ej -) + log(l - e 3 ) - 21og(l + ae 3 )} + (a + Sj ) [log(l + e 3 ) - log(l - e 3 )] 
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= E Cl [{(-l) l -2a l + a + (-l) 4+1 a + l}e;. + {(-l) l a-2a i; + 1 + l + (-l) J + 1 + a}e} +1 ] . (9) 
i=i 

When i = 1, coeff(ej) = ci(— 1 — 2a + a + a + 1) = 0. The first non-vanishing term in the expansion is then of 
order 2. Shifting indices of the first term in Eqn. ^ gives 
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where 



l-a + (-l)' 1+1 (l + a-2a l + 1 ) _ J 2(1 - a J+1 )/ (*(< + 1)) i odd, 



-2(a-a 2+1 )/(i(i+l)) i even. 
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FIG. 1: Plot comparing the naive and approximate formulae, 
truncated at different orders for calculating JSD as a function 
of the normalized L2-distance (||e||; see Section III) between 
pairs of randomly generated probability distributions. Best fit 
slopes are: -2.05 (k = 3), -5.89 (k = 6), -8.14 (k = 9), -11.91 
(k — 12) and -105.43 (comparing naive with k — 100). 



FIG. 2: Probability of obtaining (erroneous) negative values, 
when directly evaluating JSD using its exact expression, is 
plotted as a function of ||e||. When implemented in matlab, 
we observe that the naive formula gives negative JSD when 
||e|| is merely of O(10" 6 ). 



This series expansion can be further simplified as 

oo 

5j = y~] (B 2 i-i + B 2i Sj) e 



i=l 



i=l 

B 2l (-1 - I 



. r J = -{2i+i) a£j - (13) 

Since —1 < aej < 1, we have —1 ^ - Ej ^ 1. Thus, for every i, (B 2 i-i + BuB^e 2 ? ^ 0, making Sj — and the 
series expansion for A na i ve — non-negative up to all orders. 



III. NUMERICAL RESULTS 



The accuracy of the truncated series expansion can be compared with the naive formula by measuring the JSD 
between randomly generated probability distributions. Pairs of probability distributions with —4 ^ log 10 ||e|| < 0, 

lY, N _ e 2 

where ||er|| = y — J ^ f 1 3 , were randomly generated and the J-S divergence between each pair was calculated by both a 
direct evaluation of the exact expression (A naive ) and the approximate expansion (A fe ; k e {3, 6, 9, 12}), where 

N k 

A fc =2$>% 5 *i* = 5><4 +1 . (14) 

j=l i=l 

The results shown in Fig. [TJsuggest the series expansion to be a more numerically useful formula when the probability 
distributions differ by |je|j ~ O(10 -0 5 ). Fig. [2 further shows that when ||e|| ~ O(10 -6 ), a direct evaluation of the 
exact formula for JSD gives negative values (when implemented in matlab). 



APPENDIX 



Here we include the matlab code used in the figures for approximate evaluation of JSD using its series expansion. 
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function [ JS , epsnorm] = JSapprx(pil, pi, pi2,p2, order) 

7, [ JS, epsnorm] =JSapprx (pi 1, pi, pi2,p2, order) calculates JS 

7 divergence given two probability distributions and 

7 their relative weights. JSapprx uses an approximation 

7o to the JSD by expanding in powers of epsilon=(pl-p2)/(pl+p2) 

7o and truncating at an order input by the user. 

7. 

7o This calculation is described in the technical report 

7o ' 'A non-negative expansion 

7o for small Jensen-Shannon Divergences'' 

7, by Anil Raj and Chris H. Wiggins, October 2008 

% average of distributions 

pbar=(pl+p2)/2; 

7o difference of distributions 

eta=(pl-p2)/2; 

7o ratio of difference to average 
epsilon=eta. /pbar ; 

7o difference in biases, where pil+pi2=l 
alpha=pil-pi2; 

7» calculate JS by summing up to order 'order' 
js=zeros (size (pbar)) ; 

7, denominator computed by summing, as well 
denominator=0 ; 
for i=2 : order 

denominator=denominator+(i-l) ; 

7. numerical coefficient 

c=(-l)~i*(l/denominator) ; 

Bi=c* (alpha" (mod(i , 2) ) -alpha" i) ; 

js=js+Bi*(epsilon. ~i) ; 
end 

7o sum over ' j ' : 
JS=pbar'*js/2; 

7. convert from nats to bits: 
JS=JS/log(2) ; 

7o norm of epsilon reported as output 
if nargout==2 

epsnorm=sqrt (sum(epsilon. ~2)/length(pbar) ) ; 
end 
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